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MARKOV CHAINS WITH POLYNOMIAL EIGENFUNCTIONS 
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We provide a sharp nonasymptotic analysis of the rates of conver- 
gence for some standard muhivariate Markov chains using spectral 
techniques. All chains under consideration have multivariate orthog- 
onal polynomial as eigenfunctions. Our examples include the Moran 
model in population genetics and its variants in community ecol- 
ogy, the Dirichlet-multinomial Gibbs sampler, a class of generalized 
Bernoulli-Laplace processes, a generalized Ehrenfest urn model and 
the multivariate normal autoregressive process. 

1. Introduction. The theory of Markov chains is one of the most useful 
tools of applied probability and has numerous applications. Markov chains 
are used for modeling physical processes and evolution of a population in 
population genetics and community ecology. Another important use is sim- 
ulating from an intractable probability distribution. It is a well-known fact 
that under mild conditions discussed in [2], a Markov chain converges to 
its stationary distribution. In the applications mentioned above, often it 
is useful to know exactly how many steps it takes for a Markov chain to 
be reasonably close to its stationary distribution. Answering this question 
as accurately as possible is what finding "rates of convergence" of Markov 
chains is about. 

In the current paper, we provide a sharp nonasymptotic analysis of rates of 
convergence to stationarity for a variety of multivariate Markov chains. This 
helps determine exactly what number of steps is necessary and sufficient for 
convergence. These Markov chains appear as standard models in population 
genetics, ecology, statistics and image processing. 

Here is an example of our results. In community ecology, scientists study 
diversity and species abundances in ecological communities. The Unified 
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Neutral Theory of Biodiversity and Biogeography (UNTB) is an important 
theory proposed by ecologist Stephen Hubbeh in his monograph [27]. There 
are two levels in Hubbell's theory, a metacommunity and a local community. 

The metacommunity has constant population size Nm and evolves as fol- 
lows. At each step, a randomly chosen individual is replaced by a new one. 
With probability s (speciation), the new individual is a new species that 
never occurs before. With probability 1 — s (no speciation), the new individ- 
ual is a copy of one (randomly chosen) of the remaining Nm — 1 individuals. 
After a sufficiently long time, the metacommunity reaches equilibrium which 
is the celebrated Ewen's sampling distribution. This process can be consid- 
ered as a variant of the so-called infinite-allele Moran model in population 
genetics [18]. In population genetics, the reproducing individual could be 
the same as the dying one. 

The local community has constant population size N , which is much 
smaller than Nm in scale. The evolution of the local community is sim- 
ilar to the metacommunity except it has migration instead of speciation. 
Specifically, at each step, one individual is randomly chosen to die. With 
probability m (migration), the new individual is an immigrant randomly 
chosen from the metacommunity. With probability 1 — m (no migration), 
the new individual will be a copy of one (randomly chosen) of the remaining 

— 1 individuals in the local community. Again this process is a variant 
of the so-called multi-allele Moran model in population genetics [18]. The 
metacommunity evolves at a much larger time scale and is assumed to be 
fixed during the evolution of the local community. 

Since the publication of [27], UNTB received both acclaims and criticisms. 
For example, in [38], ecologist McGill tests some classical datasets against 
the equilibrium species abundance of the local community predicted by the 
UNTB. McGill raised the "time to equilibrium" issue which is of both practi- 
cal and scientific interests. In order to generate the equilibrium distribution 
of the local community, he actually simulates the evolution process of the 
local community on computer. As he claims ([38], page 882): 

No indication of what fixed number of time steps is appropriate has been 
pubhshed. My own experiments show that it can be large. . . 

Also scientifically it is desirable to know how soon a local community reaches 
equilibrium. One hundred years? One thousand years? 

A simulation of the local community process is performed on computer. 
Suppose that the metacommunity has d = 5 species with uniform species 
frequencies p = (0.2,0.2,0.2,0.2,0.2). The local community has population 
size = 20 and the migration probability is m = 0.05. Assume that initially 
all 20 individuals in the local community are of the same species. Five thou- 
sand independent replicas of the local community process are simulated for 
1000 steps. For any (random) count vector X= (Xi,...,X^) of the local 
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community, where Xi is the count of individuals of species i, we can define 
a Watterson type statistics (population homogeneity) as 



The empirical distributions of the Watterson statistics are plotted along 
time in Figure 1. By visual inspection, we suspect that the distribution of 
Watterson statistics is close to its stationary distribution after a few hundred 
steps. A commonly used measure for distance to stationarity for Markov 
chains is the chi-square distance. Consider a Markov chain with state space 
X, transition density and stationary density m with respect to a 

cj-finite measure ^. The chi-square distance to stationarity after £ steps, 
starting at state x, is defined as 



Proposition 4.10 provides rigorous quantitative answers to the question of 
how long it takes the local community process to be close to its stationary 
distribution. Let XArei(0 t>e the chi-square distance between the distribution 
of the d-dimensional vector of local community species abundances after 
H. steps and the stationary distribution, assuming initially all individuals 
are of species i. Proposition 4.10 tells us that 595 steps are necessary and 
sufficient for convergence. This means that, for / < 595 steps, XArei(0 high 
and, for / > 595 -|- 190c (c is any positive constant) steps, XArej(0 ^ For 
example, by / = 595 -|- 190 x log 100 ~ 1470 steps, the chi-square distance 
Xatg, (0 ^ 0.01. If this is a tree population with mortality rate 1% per year, 
595 and 1470 steps translate into 2975 and 7350 years, respectively. For 
people who prefer total variation distance, it should be kept in mind that the 
chi-square distance always produces an upper bound for the total variation 
distance (see Section 2.1). Figure 2 shows how the (exact) chi-square distance 
for the d-dimensional Markov chain is decreasing over time. 

The calculations work because the Markov chain corresponding to the 
local community process admits a system of multivariate orthogonal poly- 
nomials as eigenf unctions. Then a summation formula due to Griffiths (re- 
viewed in Section 2.2.2) pertinent to this system of orthogonal polynomials 
allows us to do explicit calculations for the chi-square distance. 

We provide similar results for all other examples considered in this paper. 
For every Markov chain, we find positive constants D and R, which depend 
on various parameters of the Markov chain, such that after D — cR steps 
the chi-square distance is larger than an explicit constant multiple of e'^, 
and after D + cR steps the chi-square distance to stationarity is less than an 
explicit constant multiple of e~'^ or similar simple functions. In this sense, we 
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Simulation of Ihe local conn m unity process 
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Fig. 1. Empirical distributions of the Watterson statistics over time from 5000 simu- 
lations of the local community process. The parameters are N = 20, d = 5,m = 0.05 and 
Pi = 1/5. Starting state is Nei. 
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Fig. 2. Chi-square distance of the local community process over time. The parameters 
are N = 20, d = 5, m = 0.05 and Pi = l/5. Starting state is Nei. 
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say that D steps are necessary and sufficient for convergence in clii-square 
distance. 

Tlie paper is organized as follows. Section 2 provides the required back- 
ground on the rates of convergence for Markov chains and some multivariate 
orthogonal polynomials. Section 3 gives simple criteria to verify that a re- 
versible Markov kernel has orthogonal polynomials as eigenfunctions. Section 
4 contains a wide spectrum of applications. In every example we analyze the 
rates of convergence of the Markov chain starting at a natural initial state. 
The frameworks in Sections 4.1 and 4.2 allow treatment of seemingly dif- 
ferent Markov chains in a unified way. Specific examples, for example, the 
multivariate Moran model, the Dirichlet-multinomial Gibbs sampler and 
Bernoulli-Laplace processes extend previous results [6, 11, 15, 16] for the 
univariate case. Section 4.3 contains analysis of a class of Ehrenfest urn mod- 
els which generalize [35]. In Section 4.4, we analyze the multivariate normal 
autoregressive process which arises in the multigrid Monte Carlo method 
[22] and general overrelaxation MCMC algorithms [3, 4, 40]. 

We realize that the Markov chains being actually used at the forefront 
of today's research are quite complicated and it is still a serious research 
effort to provide useful analysis for the rates of convergence of such Markov 
chains. Still, our examples are standard and easy to understand models and 
it is nice to have a sharp analysis of the rates of convergence of these chains. 

2. Background. 

2.1. Convergence rates of Markov chains. Let {X,J^) be a measurable 
space equipped with a cj-finite measure ji. Suppose we are given a Markov 
chain on state space X described by its transition density K{x,x') with 
respect to fi{dx'). Suppose further that the chain has stationary measure 
m{dx) =m{x)ii{dx). Let denote the density of the chain started 

at state x after I steps. The chi-square distance between K\x,-) and the 
stationary measure m is defined by 



Intuitively the chi-square distance penalizes more the discrepancies at points 
which have smaller probability mass (density) at stationarity. The commonly 
used total variation distance is defined by 



While total variation distance is always bounded in interval [0, 1] , the chi- 
square distance assumes values in [0, oo]. By Cauchy-Schwarz inequality, the 
chi-square distance gives an upper bound for the total variation distance 
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Let P{m) := {f:X ^R: J;^ p{x)m{x)n{dx) < 00} denote the Hilbert 
space equipped with inner product 



{f,9)p{m)='Em[f{X)g{X)]= / f{x)g{x)m{x)^x{dx). 
The Markov chain K operates on l'^{m) by 

Kf{x)= I K{x,y)f{y)f,{dy). 



X 

K is called reversible when m{x)K{x,x') = m{x')K{x' ,x) for all G X, 
or equivalently, when the operator K is self-adjoint: 

Suppose that /^(m) admits an orthonormal basis of eigenfunctions {cl)n}n>o 
with (/>o = 1 such that 

K<j)n{x) = (3n(l)n{x), n > 0, 

where the eigenvalues {Pn}n>o satisfy /3o = 1, |/9n| < 1, and I^J^i Pn < °o. 
Then, X is a Hilbert-Schmidt operator and 

oo 

K\x,x) = ^ [3\^(t)ri{x)4>n{x')'m{x ) [convergence in f'{m x m)]. 

n=0 

Also, 

oo 

(2.1) xl{l) = Y.^n4>l{x). 

n=l 

The central identity in our analysis throughout the paper is (2.1). The chal- 
lenge is to work with the eigenvalues {/3n}n>o and eigenfunctions {(^n}n>o 
and manipulate the right-hand side in (2.1) to obtain sharp rates of conver- 
gence. 

2.2. Multivariate orthogonal polynomials. Before introducing multivari- 
ate orthogonal polynomials we first set up the definitions of some standard 
multivariate distributions. To ease the notational burden, we will use bold- 
face letters to indicate vectors. For a vector x = (xi, . . . , Xd) G IK'^, 

did 
I x| — ^ ^ Xj , I Xj I — ^ ^ Xj , I X ' I — ^ ^ Xj . 

i=l j=l j=i 

The multinomial coefficient is denoted by 

|x|\ 1x1! 



X / Xl\---Xd\' 
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Increasing and decreasing factorials are denoted by 

a(fc) = a{a + 1) • • • (a + A; — 1), a[fc] = a{a — 1) • • • {a — k + 1). 
a(o) =0(0] = 1 by convention. For vectors x = (xi , . . . , x^) and n = (ni , . . . , n^) , 

d d d 

x" = n^rS = [](xi )(„.), X[n] =[|(xi)[„,]. 

i=l i=l i=l 

Three spaces that play important roles later are listed below: 



^d 


= {p= 




[0,1]'': 


|P| = 1}, 


\>d 


= {x = 


(xi,. 


..,Xrf)E<:|x| 


= N}, 


■yd 


= {x = 


(xi,. 


..,Xrf) gNo:|x| 


= iV,0<Xi</i} 



Dirichlet distribution with parameters q = (ai, . . . , a^), Oj > 0, has sup- 
port A'^ and density 

P-^' ""'l°>- r(.o".TM ffi?-"- "^^'^ 

In Bayesian statistics, Dirichlet distribution is the conjugate prior for the 
multinomial distribution. 

Multinomial distribution, with parameters > and p € A'^, has support 
Xfi and probability mass function 

(2.3) A^(x|iV,p)=(^)npr, xG^^. 



1=1 



Dirichlet-multinomial distribution, with parameters > and cx = (ai, . . . . 
ad),ai > 0, is the Dirichlet I?(p|q;) mixture of multinomial A^(-|A^,p) and 
has probability mass function 

(2.4) VMi.\N.a) = ni^''"-''-' = "^'3''' . xeA-t 



The same distribution is called multivariate negative hypergeometric dis- 
tribution in [32], page 179. When a = (1, . . . , 1), it is the well-known Bose- 
Einstein distribution. Note as ct/jaj -^p = {pi,. . . ,pct) G A*^, PA^(x|A^, a) 
7W(x|7V,p). 

Let £ = (/i, . . . , Zd) be a vector of positive integers and d < N <\l\. Replac- 
ing ai by —li for 1 < i < d in (2.4), we obtain the hypergeometric distribution 
with parameter N and i: 

(2.5) H(x|Ar,^)=f^)%f^ = ^%^, xG<, 
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Classically the hypergeometric distribution occurs from sampling A'^ balls 
without replacement from a pool of \£\ balls with composition £. 

Multinomial, Dirichlet-multinomial and hypergeometric distributions have 
alternative interpretations in a unified framework called the Polya (or Polya- 
Eggenberger) urn scheme. Chapter 40 of [32] is dedicated to the properties 
and history of the Polya urn model. An urn initially contains h balls of 
color 1, I2 balls of color 2,..., Id balls of color d. In a typical Polya urn 
scheme, a ball is randomly drawn. The color of the ball is noted and the 
ball is returned to the urn along with c additional balls of the same color. 
This experiment is repeated N times and the distribution of the compo- 
sition of the observed N balls is called a Polya-Eggenberger distribution. 
When c = 0, this becomes sampling with replacement and the distribution 
is multinomial with parameters N and p = £/\i\. When c > 0, the distribu- 
tion is Dirichlet-multinomial with parameters N and a = £/c. When c = — 1, 
this becomes sampling without replacement and the distribution is hyperge- 
ometric with parameters and £. In recent literature, Polya type processes 
allow much more general replacement schemes. In Section 4.1, the Moran 
model in population genetics, its variants in community ecology, and the 
Dirichlet-multinomial Gibbs sampler are put in a unified framework called 
the sequential Polya urn models. 

The multivariate normal distribution is used when we study the multi- 
variate normal autoregressive process in Section 4.4. The density of a multi- 
variate normal distribution with mean vector G M"^ and covariance matrix 
S at a column vector x is given by 

AA(x|/X, S) = ^ g-(x-M)^S-l(x-/x)/2^ ^ g j^d 

V(2vr)'^|S| 

Next we review the relevant developments of multivariate orthogonal poly- 
nomials for these classical multivariate distributions. These polynomials oc- 
cur as the eigenfunctions for the Markov chains analyzed in this paper. 

2.2.1. Review of some explicit multivariate orthogonal polynomials. Iliev 
and Xu [28] explicitly construct systems of orthogonal polynomials on vari- 
ous multivariate discrete weight functions which correspond to classical mul- 
tivariate probability distributions. Most pertinent to us are the multivariate 
Hahn polynomials. We present their construction below. The parameters in 
[28] are shifted by one from ours. 

Multivariate Hahn polynomials. For any (ni,...,nrf) S X^, we let n = 
(ni, . . . , nrf_i) be the index vector and define multivariate polynomials on 



Qn(x;iV,a) 



(2.6) 



where 
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^U^n(-A'+|x,-.KIn-|),„,, 

X Q„^,(xj;iV- |xj_i| - |nJ+^|, 



Qn{x;N,a,f3)=3F2 



-n, n + a + (3 — 1, —x 
a,-N 



j=0 



(a)y)(-iV)(,.)j! 



is the classical univariate Hahn polynomial. The univariate Hahn polynomi- 
als satisfy the orthogonality relation 

^VM{-\N,a,p) [Qn{X; N, a, p)Qm{X; N, a, /?)] 

(2.7) 

(A^ + a + /?)(„)(/?)(„) 



(^) {2n + a + (3-l){a + /?) („_!)(«)(„) 

A survey of the univariate Hahn polynomials is in [29], Section 6.2. The 
following proposition is essentially Theorem 5.4 in [28]. 

Proposition 2.1. The system (2.6) satisfies the orthogonality relation 
^VM{-\N,c.) [Qn(X; N, a)Qn' (X; N, a)] 
(2.8) = Qn{^;N,a)Qn>{^;N,a)VM{^\N,a) 



where 

d: 

(2.9) 



n 



(|Q|+A^)(|n|) 
(^)[|n|]l«l(2|n|) 

(la^'l + |n^| + |n^-+ i| - 1)(„^.)(|q^'+^| + 2|nJ-+^|)(„^)n,! 

^ 11 



(«j)(n,) 



In probabilistic language, the construction of (2.6) uses the stick-breaking 
property of the Dirichlet-multinomial distribution. Essentially the same re- 
sult was implied in an earlier work by Karlin and McGregor [36]. 

Remark 2.2. Here are a few properties of the orthogonal system (2.6): 
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1. Qyi is a polynomial in xi, . . . ,Xd^i of degree |n|. 

2. Qn{Ned;N,cx) = l. 

3. When d = 2, we recover the univariate Hahn polynomials. 

The orthogonality relation (2.8) does not depend on the positivity of 
cti, 1 < ^ < Therefore switching parameter range gives the orthogonal poly- 
nomials on the multivariate hyper geometric distribution. Note that the sam- 
ple space (and thus the index set) is a subset of ^Y^. 

Proposition 2.3. The system (2.6), with Ui = — /j, l<i<d, is orthog- 
onal with respect to the hypergeometric distribution 7i{-\N,£) (2.5). 

Multivariate Krawtchouk polynomials. Recall the well-known limiting re- 
lation 

Q4x-,N,pt,il-p)t) = ±^---h)in + '-^\d-^h 



j=0 

( —n.—x 1 



-71, —X 

-N 



Knix;N,p) 



as t ^ oo, where K^s are the univariate Krawtchouk polynomials orthogo- 
nal for the binomial distribution with parameters and p. Properties of the 
univariate Krawtchouk polynomials are documented in [29], page 183. Tak- 
ing limits q;/|q!| ^ p = {pi, ■ . ■ ,Pd) G in the orthogonality relation of the 
multivariate Hahn polynomials (2.8), we obtain the orthogonal polynomials 
(multivariate Krawtchouk) for the multinomial distribution A^(-|A^, p). 

Proposition 2.4. The system 



K„(x;iV,p) = Lil^ [](-Ar+ |x,_i| + |nJ'+i|)(„^.) 

(2.10) 



xi^„^.(^x,-;iV-|x,_i|-|n^'+^|,^ 

satisfies the orthogonality relation 

Eai(.|7V,p) [^n(X; iV, p)K„, (X; iV, p)] 

= i^n(x;iV,p)i^n'(x;iV,p)A^(x|iV,p) 
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where 

The systems of orthogonal polynomials for a fixed multinomial distri- 
bution are not unique. A more general construction of the multivariate 
Krawtchouk polynomials was given by Griffiths [23] and recently surveyed in 
[43]. The system defined by (2.10) is a special case in this general framework. 

Multivariate Jacobi polynomials. Next we record the multivariate or- 
thogonal polynomials for the Dirichlet distribution. Recall that, for the uni- 
variate Hahn polynomials, 



Qn{x;N,a,P)=3F2 

n 

= E 

3=0 
n 

E 



-n,n + a + P — 1,—x 
a,-N 



1 

(-n)(j-)(n + a + /5 - l)(j-)(-a;)(j) 



{-n)(j){n + a + (3 - l)(j) 
n,n + a + f3 — I 



2F1 



a 



= Jn{z;a,f3) 



as x/N — > z. {Jn}o<n<oo SLice the shifted Jacobi polynomials which are or- 
thogonal for the beta distribution with parameters a and (3. Taking limits 
x/A^ — > z = {zi, . . . , Zd) G A"^ in the multivariate Hahn polynomials (2.6), we 
obtain a system of multivariate polynomials on A*^: 

(2.12) J„(z;a) = n|z^|"^J„^(^|||;a,,|a^-+i|+2|n^'+'|), 

zG A'^, nGN[J-\ 

Proposition 2.5. The system (2.12) satisfies the orthogonality relation 
Eo(.|a) [ Jn(Z; q) Jn' (Z; a)] 

= I Jn(z;Q;)Jn'(z;Q:)P(z|Q:)(iz 
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where 

|a|(2|n|) 



(2.13) 



A stick-breaking construction of (2.12) was known earlier (see, e.g., [17, 
37]). 

Hermite polynomials. We require univariate Hermite polynomials to carry 
out the analysis of the multivariate normal autoregressive process. The eigen- 
functions of this process (after appropriate variable transformations) turn 
out to be products of appropriate univariate Hermite polynomials. The uni- 
variate Hermite polynomials are defined by 

(2.14) ff„M=„,gtli_J_, „>„. 

They satisfy the orthogonality relation 

K,n{y)Hn{y)e~y dy = 2^n\5mn. 

VTT J-oo 



An important multilinear generating function formula ([29], Example 4.7.3), 
gives 

(2 15) V :^lMt" = 1 ^2xH/(i+t) 



2.2.2. Griffiths' construction of kernel polynomials. In the applications 
in Section 4, we need to manipulate certain sums of products of the relevant 
multivariate orthogonal polynomials. For a fixed multivariate distribution 
m, the kernel polynomials are defined as 

/i„,(x,y)= ^?n(x)Q°(y) 

|n|=n 

for any complete system of orthonormal polynomials {Q^} in l'^{m). The 
kernel polynomials are invariant under the choice of orthogonal polynomial 
system. Fortunately, these sums can be carried out in closed form for the sys- 
tems of polynomials that we consider. In this section, we review the work by 
Griffiths, who explicitly constructed the kernel polynomials for the Dirichlet- 
multinomial [25], Dirichlet [24, 25] and the multinomial [26] distributions. 
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Proposition 2.6. For x, y G and <n< N, the kernel polynomials 
for the Dirichlet-multinomial distribution DM{-\N,a) (2.4) are 

/i„(x,y) = (|q;| +2n- 1) ^ 

(2.16) 

( 

m\(n — m)\ 

m,=0 ^ ' 

where 



JL (\a\ + m)(„-T\ 



^ ^ ^ ^'"^ '"^^ " |,^. I ^ / nf«.(,) (|a| +iV)M(|«| +iV)M • 

For example, the first two kernel polynomials are 
^o(x,y) = 1; 

(|q!| + l)(|a| + A^) / |a| {ai + Xi){ai + yi) \ 

= N [JRTW h ) ■ 

When calculating convergence rates of Markov chain starting from state 
Nei, we need to evaluate the quantity 

, , _ ^[«] (|Q| + 2ra-l)(|Q|)(„_i)(|Q!| 

(^+|a|)(n) n\{ai)^n) 

(2.18) 

'N\ (|a| + 2n- 1)(|q|)(„_i)(|q;| -««)(„) 



nj (|q| +iV)(„)(ai)(„) 

The first equality follows from (2.9), (2.13) and (2.21) below. 

Replacing Oj by —li, 1 <i <d, in (2.16) and (2.17), we obtain the kernel 
polynomials for the hypergeometric distribution Ti.(-\N,i) (2.5). For exam- 
ple, when li> N and x = y = A^e^, 



hniNei,Ne,) 



N\ (|£|-2n + l)|£|[„,i](|£| -/,)[, 
^) (|£|-iV)M(^i)N 



Writing Xi = Nwi, yi = Nzi and taking limit iV ^ oo in (2.17), we recover 
the kernel polynomials for the Dirichlet distribution. 

Proposition 2.7. For w,z € A'^ andO <n < oo, the kernel polynomials 
for the Dirichlet distribution T>{-\a) (2.2) are 

(2.19) /j„(w, z) = (|a| + 2n - 1) X: (-I)""'" ^ ""^^"/^ U(w, z), 

m\(n — my. 
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where 

(2.20) e™(w,z) = ("i)Tp^i[i^^'^y 



\e.\=m 



^ ' Ui=iai^,) i=i 



The kernel polynomials for the Dirichlet distribution were derived in [24] 
where the transition density of the multi-allele Wright-Fisher diffusion pro- 
cess was expanded in terms of (2.19). When w = z = e^, 

n\ ^0 \m) (a,)(™,) 

cx\+2n-l) |q;|(„) ^ A (-n)(„)(|a| + n - 1)(^) 



^ ' n\ \cx\ + n - ' 



m=0 



(aj)(m) 



|a| +2n- 1)(|q|)(„_i)(|q;| -ai)(^ 



?^Kai)(n) 

where the last equality uses the Chu~Vandermonde summation formula. 

The kernel polynomials for the multinomial distribution were given in the 
work by Griffiths [26]. 

Proposition 2.8. For x,y G and <n< N, the kernel polynomials 
for the multinomial distribution A^(-|A^, p) (2.3) are 

,2.22) ''.i^.y)-t(:i)(:::)i-n. 



m=0 



where 



E 

\e.\=m 



When X = y = iVe j , 

■N\ ^l-p^Y 



(2.23) hn{Nei,Nei 



n 



Pi 



3. Markov chains with orthogonal polynomial eigenfunctions. There is 
a large class of Markov chains with polynomial eigenfunctions, for exam- 
ple, birth-death processes [41], Cannings exchangeable model [5] in popu- 
lation genetics, certain two-component Gibbs samplers [11] and of course 
all Markov chains considered in this paper. When the Markov kernel is re- 
versible, often the orthogonal polynomials for the stationary distribution 
come up as eigenfunctions. We present simple tools for identifying cases 
when orthogonal polynomials are the eigenfunctions of a reversible Markov 
kernel, first in the univariate case and then the multivariate case. 
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Lemma 3.1. Suppose tt is a univariate distribution and /^(vr) admits an 
orthogonal basis of polynomials {qn{x)}o<n<c where c = #supp(7r), that is, 
qn{x) is a polynomial in x of exact degree n and 

{Qn,qm)p(Tr) = = dlSnm- 

If K is a Markov kernel reversible on vr and 

(3.1) Ej^(2. .-([X"] = f3nx" + terms in x of degree <n, <n < c, 
then: 

1. K has eigenvalue f3n with eigenfunction qn, that is, Kq^ = f3nqn for < 
n < c. 

2. The chi-square distance between K\x,-) and vr is 

n>l 

Proof, go is a constant function and trivially Kqo = qo with eigenvalue 
/3o = 1. By hypothesis (3.1), Kqn = (3nqn + J2 m<n'^tnqm- But the Coefficients 
am = {Kqn, qm)i2(n) = {Qn, KQ'm)«2(^) = siucc Kq^ is a polynomial of degree 
m and can be expanded in terms of the basis polynomials of degree <m <n 
which are all orthogonal to g„. Therefore Kq^ = f3nqn- The expression for 
the chi-square distance follows from (2.1). □ 

Under mild assumptions, this result can be generalized into a multivariate 
case. We recall that the notation |n| denotes the sum of coordinates of a 
vector n. 

Lemma 3.2. Suppose tt is a multivariate distribution and P{ir) admits 
an orthogonal basis of multivariate polynomials "[gn(x)} where qn is a poly- 
nomial of exact degree |n| and 

{qn,qm)l2{n) = 'E^[qn{X)qm{X)] = d^J^^. 

If K is a Markov kernel reversible with respect to vr and 

(3.2) Ej^(x,-)[^"] = /3|n|x" + terms in x of degree < |n|, 
then: 

1. K has eigenvalue f3n with corresponding eigenbasis {qn}\n\=n- 

2. The chi-square distance between X'(x, •) and vr is 

n>l 
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where 

/i„(x,y):= qn(pi.)qn{y)dn^ 

\n\=n 

is the kernel polynomial of degree n for vr. 

Proof, qq is a constant function and trivially Kq^ = qo with eigenvalue 
Po = 1- By hypothesis (3.2), Kq^ = /3|n|Qn + E| m|<|n| '^m'Zm- But the coefR 
dents Cm = {Kqn,qm)p[-K) = {Qn, Kqin)^^,^) = since Kq^i is a polynomial 
of degree |m| and can be expanded in terms of the basis polynomials of 
degree < |m| < |n|, which are all orthogonal to q^. Therefore Kq^i = /3|n|9n- 
The expression for the chi-square distance follows from (2.1). □ 

Remark 3.3. When checking conditions (3.1) or (3.2), often it is easier 
to calculate the factorial moments. Because of the simple relation = 
X]fc=o ^)^'^) where s{n,k) are the Stirling numbers of the first kind and 
especially s{n,n) = 1, the condition (3.2) is equivalent to 

Ei^(x,.)[X[n]] = /3|n|X[n] + terms in x of degree < |n|. 

The key condition (3.2) seems restrictive but holds for surprisingly many 
multivariate Markov chains which possess certain intrinsic symmetry. Most 
examples in this paper satisfy (3.2). See [43] for a class of multinomial chains 
with orthogonal polynomial eigenfunctions but which in general do not sat- 
isfy (3.2). 

The trick of preserving polynomials has a long history in population ge- 
netics (see, for example, [5, 19]). But most models in population genetics, for 
example, Wright -Fisher model, are irreversible and thus orthogonal polyno- 
mials do not come up. An exception is the Moran process which we study 
and generalize in Section 4.1. 

4. Applications. 

4.1. Sequential Poly a urn models. In this section, we study a class of 
multivariate Markov chains which can be described in terms of the classical 
Polya urns. They are all reversible with respect to the Dirichlet-multinomial 
distribution and have the multivariate Hahn polynomials as eigenfunctions. 
Interestingly, the classical multi-allele Moran process in population genetics, 
local community process in community ecology, and the Dirichlet-multinomial 
Gibbs sampler in statistics can be treated as special cases in this unified 
framework. The urn description also provides a convenient way for simula- 
tion of the processes on a computer. 
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We first define the Polya type urns. A newly constituted urn contains one 
ball of color i and weight for 1 <i <d, that is, total of d balls with total 
weight \ct\. A Polya type draw is defined as a random draw according to 
weights and the ball is returned to the urn along with one additional ball of 
the same color and of unit weight. Initially a batch of N balls of composition 
{Xqi , . . . , Xod), that is, Xoi balls of color i and each of unit weight, are added 
into a newly constituted urn. We define three classes of Polya urn models. 
In the following, s is a fixed integer between and N. 

Definition 4.1. Polya level models: One step of the Markov chain in- 
cludes three mini-steps. First randomly mark s balls among the N balls 
(excluding the d original balls) to be removed. Before removal, do s Polya 
type draws (including the d original balls). Then the s marked balls are re- 
moved. Apparently the number of balls in the urn is kept constant after each 
step and the d original balls are always in the urn. Let = {Xn, . . . ,Xtd) 
be the composition of balls (excluding the d original balls) after t steps. 
{X(}t>o forms a multivariate Markov chain on X^, which we call a Polya 
level model. 

Definition 4.2. Polya down-up models: one-step of the Markov chain 
includes two mini-steps. First randomly choose s balls among the N balls 
(excluding the d original balls) and remove them. Then do s Polya type 
draws (including the d original balls). The number of balls in the urn is 
kept constant and the d original balls are always in the urn. Let Xf = 
{Xfi, . . . ,Xtd) be the composition of N balls (excluding the d original balls) 
after t steps. {li.t}t>o forms a multivariate Markov chain on X^, which we 
call a Polya down-up model. 

Definition 4.3. Polya up-down models: one-step of the Markov chain 
includes two mini-steps. First do s Polya type draws (including the d original 
balls). Then randomly choose s balls among the N + s balls (excluding the d 
original balls) and remove them. The number of balls in the urn is kept con- 
stant and the d original balls are always in the urn. Let X^ = {Xn, . . . , Xtd) 
be the composition of A^ balls (excluding the d original balls) after t steps. 
{X(}t>o forms a multivariate Markov chain on Xfj, which we call a Polya 
up-down model. 

All three classes of the sequential Polya models have the same stationary 
distribution. The detailed balance is checked in a straightforward way. 

Lemma 4.4. All three classes of Polya urn models are reversible with 
respect to the Dirichlet-multinomial distribution PA^(-|A, a) (2.4), where 
a = {ai,...,ad)- 
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Using the trick of preserving polynomials (Lemma 3.2), it is easy to check 
that all three classes of Polya urn models take the multivariate Hahn poly- 
nomials (2.6) as eigenfunctions. Recall that \ct\ =J2i=i'^i- 

Theorem 4.5. 1. The Polya level model has eigenvalue 

- s)[k]S[n-k] 



n 



,eoVA;;%.](iv+iai)(„_fc) 

(4.1) 



with multiplicity C^"*"^"^) and corresponding eigenbasis {Qn}\n\=n) where 
Qn 05^6 the multivariate Hahn polynomials (2.6). 

2. The Polya down-up model has eigenvalue 

iV[„](iV-s + |a|)(„) 

with multiplicity C^"*"^"^) and corresponding eigenbasis {Qn}|n|=ra) where 
Qn CLf^ the multivariate Hahn polynomials (2.6). 

3. The Polya up-down model has eigenvalue 

_ iVM(iV + . + |a|)H 
^"-(iV + .)[„](iV + |a|)(.)' 

with multiplicity C^^^"^) and corresponding eigenbasis {Qn}|n|=n) where 
Qn <ire the multivariate Hahn polynomials (2.6). 

4. For all three classes of Polya models, the chi-square distance between 
-ftr'(x, •) and the stationary distribution is 

N 

(4.2) X^(0 = E/?n^n(x,x), 

n=l 

where /?„ are the eigenvalues for the corresponding process and hn are 
the kernel polynomials (2.16) for the Dirichlet-multinomial distribution. 



Proof. For sake of space, we only provide the proof for the Polya level 
models. Proofs for the other models are similar. For the Polya level model 
with parameter s, it is observed that, given Xt = x, 'Kt+i = x — Y + Z, 
where Y is multivariate hypergeometric Ti.{-\s,x), Z is Dirichlet-multinomial 
T>M{-\s,a -\- x), and Y is independent of Z. Joint factorial moments of 
multivariate hypergeometric and Dirichlet-multinomial distributions are well 
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known. With x = (xi, . . . , Xd-i) and n = (ni, . . . , Ud-i) 



Eir(x,-)^[n] 



= E(x-Y + Z)[„] 



= E n 5 E(--Y)[n-k]EZ[k] 



0<k<njr=l ^ ^ 





(^^+|a|)(|k|) 



(x + Q;)(k) 




^[|nhfc](A^+|a|)(fe) 



(^-g)[|n|-fc]g[fc] 



X[n] + terms in x of degree < |n|. 



Then the claims follow from Lemma 3.2. □ 

Remark 4.6. For all three classes of Polya urn models, the eigenval- 
ues depend on the parameter a = (ai, . . . ,ad) only through its sum \a\. 
Formulas for the eigenfunctions require complete knowledge of a. 

Complete spectral information allows for sharp results in convergence rate 
in chi-square distance. In case s = 1, an argument similar to that in [21] can 
also be used to obtain convergence rate in separation distance using only 
eigenvalues (see [44]). But in the current paper we confine ourselves to the 
convergence rate in chi-square distance. The following three examples are 
special cases of the sequential Polya urn models. 

4.1.1. Convergence rate of the Moran process in population genetics. In 
brief, the classical Moran process in population genetics models the evolu- 
tion of a population of constant size by random replacement followed by 
mutation. Suppose there are d species in a population of size N. At each 
step, one individual is chosen uniformly to die and independently another 
is chosen uniformly to reproduce. They may be the same individual. If the 
latter is of species i, the offspring has probability ruij, 1 < j < d, to mutate 
to type j. Let = {Xti, . . . , X^d) be the vector of counts of species 1, . . . , d 
at time t. {Xf }t>o forms a Markov chain on X^. The size of the state space 
is \X^\ = {^~^^~^). Let X S X^; one-step transition probabilities are 





(4.3) 




K{x,y) = 0, otherwise. 
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This model {d = 2) is due to Moran [39]. Background and references can be 
found in the text by Ewens [18]. In many apphcations, the matrix M = {rriij} 
of mutation probabihties takes a special form 

(4.4) M = (l-m)I + m\ 

\Pi ■■■ PdJ 

where < m < 1 and (pi, . . . ,pd) is a probability vector. In words, the off- 
spring has probability m to mutate. If mutation happens, the offspring will 
change to species i with probability pi. Note when m = 1, the mutation 
matrix M has identical rows and the process degenerates into a multivari- 
ate Ehrenfest chain model, which is a special case in Section 4.3. Karlin 
and McGregor [36] gave the Karlin-McGregor spectral representation of the 
transition density for the continuous-time multivariate Moran model. Their 
version of multivariate Hahn polynomials are defined iteratively and essen- 
tially the same as (2.6). 

A natural question is how long it takes such a population to be totally 
mixed. In the univariate (d = 2) and continuous-time setting, Donnelly and 
Rodrigues [16] obtain an upper bound of order (A'' log N) /m {m = mi2 + m2i 
in the above notation) in the separation and total variation distances, when 
the process starts from Xq = 0. As shown below, for the chi-square distance, 
order N/m (constant being explicit) steps are necessary and sufficient. Con- 
vergence rate in the separation distance for the multivariate case is also 
available [44] but not presented here. 

Under the reparametrization Oi = , 1 < i < d, it is easy to check 
that the Moran process defined by (4.3) and (4.4) is exactly the same as 
the Polya level model with s = 1. This gives an intuitive explanation why 
the Moran process has Dirichlet-multinomial distribution as its stationary 
distribution. We remark that all the sequential Polya urn models admit an 
interpretation as a population genetics model. For example, a Polya level 
model with parameter s means that for a population of size N, at each step, 
s individuals are sequentially selected to reproduce and then mutate, then 
s individuals are randomly chosen from the old population (size A^) to die. 
The original Moran model can be thought of as a multivariate birth-death 
type process. This generalization allows for more dynamic change of the 
population at each generation. 

The next proposition gives the convergence rate of the Moran process 
when initially all individuals are of the same species. 

Proposition 4.7. Let K denote the Moran process specified by (4-3) 
and (4-4)- Then K is reversible with respect to the Dirichlet-multinomial 
distribution D>J(-|A^, a) (2.4), where Ui = > 0, l<i<d, and: 
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1. K has eigenvalue 

n(\a\ + n — 1) 

with multiplicity C^"*"^"^) corresponding eigenbasis {Qn}\n\=n! where 
Qyi are the multivariate Hahn polynomials (2.6). Particularly, 

/3o = 1 with multiplicity 1, 

3-1=1 ; ; — ^ with multiplicity d — 1. 

^ N{N + \cx\) ^ 

2. Suppose that initially all individuals are of species i. Then, for any c> 0, 

log[3(2 V \a\)N/{N + |a|)((|a| - a,)/ai V 1)] + c 



for I > 



forl< 



-21og(l-|a|/(Af(A^ + |a|))) 

log[3(2 V |a|)A^(|Q| - ai)/{{N + \a\)ai)] - c 
-21og(l-|a|/(iV(iV + |a|))) 



Proof. The first assertion is a direct corollary to Theorem 4.5 by setting 
s = 1 in (4.1). For the second assertion, by (4.2) and (2.18), 



2 (n_\^f, n{\a\+n-l) \ 
XNeAn-}^y- ^r(Ar + |a|) ) 

^ N[n] (|q| + 2re - l)(|Q:|)(„_i)(|a| - ai)(^n) 
(iV+|a|)(„) ?^Ka^)(n) 
The ratio of the (n + l)th summand to the nth is 

|Q!|+2n ^ 2i 



N{N + \a\) -n{\a\+n-l) 

N — n IqI + 2n + 1 |q;| + n — 1 IqI — + n 

X j — j — j 

+ |q;| + n |q;| + 2n — 1 n+1 ai + n 

/^s9; N ( |q;| \ / IqI — a,- \ 
<(/3i — ^3 IV— ^ ^ VI . 

With / > iog[3(2v|c.|)7V/(jV_+|g)((|.>|-aO/».vi)]+c ^ ^^.^ ^^^.^ bounded by e-72 < 
1/2 and hence 
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Chi-square distance of the Maran process 




200 400 600 800 1000 1200 1400 1600 1800 2000 
Steps 

Fig. 3. Chi-square distance of the Moran model with N = 20, d = 5, ai = 1/5, starting 
from Nei. 



^ 2(|a| + l) ^. 
- 3(2 V |q:|) 

With / < l°g[3(2V|«|)(|Qha,)/a,jV/(jV+|«|): 
^^^^'^^ - -21og/3i 



<e" 



Af(|a| + l)(|a|-a0^2«^ + l _ 1 , 



{N+\a\)ai 



—, — — ; — > -e" 
3 2VQ: -6 



□ 



Remark 4.8. In the biological case, is large and Nra is of constant 
order. For example, consider the case m = and V\ = " ' = Vd = \i or 
equivalently, ol\ = ■ ■ ■ = ad = \- When initially all individuals are of same 

type and iV is large, _2°iog(?Ivy(iv+ij)) ^ ^^^^^^^(^ + 1) ^teps are nec- 
essary and sufficient to drive the chi-square distance low. Figure 3 shows 
the decrease of the chi-square distance for the Moran process with = 20, 
d = 5, ai = 0.2. In this case, ^^^^^^N{N + 1) w 668. 



Remark 4.9. Consider the case a\ = ■ ■ ■ = a^ = \. The stationary distri- 
bution is uniform on X%. When N is large, If^^Z'^^^^tt]-,^ - ^^^^^ x 
N{N + d) steps are necessary and sufficient to drive the chi-square distance 
low. Figure 4 shows the decrease of the chi-square distance for the Moran 
process with = 20, d = 5, Oi = 1. In this case, ^°^^j[f^^^ iV(iV + d) 205. 
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4.1.2. Convergence rate of the local community process by Hubbell. A de- 
scription of the local community process by Hubbell is given in the Introduction. 
We observe that the process is almost the same as the Moran process ex- 
cept that the reproducing individual cannot be the same as the dying one. 
One-step transition probabilities are 

Y.k^j Xkruki + {xj - l)mji\ 

(4.5) l<i^j<d; 
K{-x,x) = 1 - ^/f(x,x + ej - Bj); 

ir(x,y)=0 otherwise, 

with the matrix of mutation probabilities same as (4.4). This is essentially 
the process prescribed in Hubbell's book [27], page 86, and simulated by 
McGill [38]. Again under parametrization Oi = , we find that the 

local community process by Hubbell is the same as the Polya down-up model 
with s = 1. The following proposition is similar to Proposition 4.7 and the 
proof is omitted. 

Proposition 4.10. Let K denote the local community process by Hubbell 
specified by (4-5) and (4-4)- Then K is reversible with respect to the Dirichlet- 
multinomial distribution X'A^(-|A^, a) (2.4), where ai = '^^^^^^^ > 0, 1 < 
i <d, and: 

Chi-square distance of the Moran process 
3 1 T 1 1 1 1 1 1 



K{x,x + ej - ej 



N 



2 1.5 



0.5 ■ 



100 



300 400 

Steps 



Fig. 4. Chi-square distance of the Moran model with N = 20, d = 5, ai = 1, starting 
from Nsi . 
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1. K has eigenvalue 

{N -n){N + \a\+n- 1) 



Pn 



N{N + \a\ - 1) 

n(n + \a\ — 1) 
iV(iV+ |a| - 1)' 



< n < A^, 



with multiplicity C^"^^ ^) and corresponding eigenhasis {(5n}|n|=n; where 
Qn CLTC the multivariate Hahn polynomials (2.6). Particularly, 

/So = 1 with multiplicity 1, 

(N-l)(N+\a\) \a\ 

Bi = — ; — ; ^ = 1 — — I — I T with multiplicity d — 1. 

^ iV(iV + |a|-l) A^(A^ + |q|-1) ^ ^ 

Suppose that initially all individuals are of species i. Then for any c > 0, 

log[3(2 V \a.\)N/{N + |a|)((|Q| - a^M V 1)] + c 



Xk(0<e-^ forl> 



-21og(l-|a|/(iV(iV+|a|-l))) 



2 m>iec ^^^^log[3(2V|a|)iV(|a|-a,)/((iV + |«|)a, 



6 ' - -21og(l- |Q|/(iV(iV + |a| -1))) 

Remark 4.11. In the ecological case, is large and Nm is of con- 
stant order. For example, consider the case m = and pi = • • • = Pd = ^, or 
equivalently, ai = • • • = = ^. When initially all individuals are of the same 
species and N is large, l°g((6^(^^~|:W/(^+i)) ~ log6(^d-i) g^^gpg g^^g necessary 
and sufficient to drive the chi-square distance low. 



We have seen a small simulation example in the Introduction. Let us 
look at another concrete example in [38] where the computation time of the 
simulation is prohibitive. To sample the stationary distribution of the local 
community of size N = 20,000 and migration probability m = 0.1 (corre- 
sponding to the famous Barro Colorado Island dataset in ecology), McGill 
first simulates a metacommunity with population size Nm and speciation 
probability s. Given a fixed metacommunity configuration. Proposition 4.10 
tells us how many steps need to be run for the local community to reach 
equilibrium. For example, for a metacommunity configuration with d = 300 
equally abundant species, 1.44 million steps are necessary and sufficient for 
the local community to reach equilibrium. This coincides with McGill's em- 
pirical findings ([38], Figure 1). If we assume that the tree mortality rate 
is 1% per year, then this translates into 7200 years for the Barro Colorado 
Island to reach equilibrium. 
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4.1.3. Convergence rate of the Dirichlet-multinomial Gibbs sampler. This 
is the multivariate generahzation of the canonical beta-binomial Gibbs sam- 
pler. The remarkable paper [11] gives explicit diagonalization and sharp con- 
vergence rates of Gibbs samplers for six exponential families. One of their 
motivating examples is how fast the classical beta-binomial Gibbs sampler 
converges ([11], Proposition 1). We give analogous results for the Dirichlet- 
multinomial Gibbs sampler here. 

Consider the two-component Gibbs sampler with Dirichlet prior vr(p) ~ 
'D{-\a) and multinomial likelihood /(x|p) ~ A^(-|A^, p). The posterior dis- 
tribution is again Dirichlet, that is, 7r(p|x) ~ P(-|x-|-q:). The Gibbs sampler 
iterates the following two steps: 

• From x, draw p from I'(-|x -|- a); 

• From p, draw y from A^(-|A^, p). 



The marginal x-chain of the Dirichlet-multinomial Gibbs sampler forms 

'd 

N 



a Markov chain with state space and transition probabilities 



i^(x,y)= / M{y\N,p)V{p\^ + a)dp 

(4.6) 



X, y fc '^N- 



\yj (A^+|a|)(;v) ' 

We find that the marginal x-chain (4.6) is actually the Polya level model 
with s = N. Again the proof of the following proposition is analogous to 
Proposition 4.7 and is omitted. 

Proposition 4.12. Let K denote the marginal x-chain (4-6) of the 
Dirichlet-multinomial Gibbs sampler. Then K is reversible with respect to 
the Dirichlet-multinomial distribution D^A{^\N ,01) (2.4), and: 

1. K has eigenvalue 

/9n= ,,,^Pi, , 0<n<iV, 
(A^+|a|)(„) 

with multiplicity C^"*"^"^) and corresponding eigenbasis {Qn}|n|=n; where 
Qn are the multivariate Hahn polynomials (2.6). Particularly, 

/3o = 1 with multiplicity 1, 

N 

Pi = ; — r with multiplicity d — 1. 

N -\- \a\ 
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2. Suppose that the starting state is Nei. Then for any c > 



xk(0> gS' 



^ 1 / log[3(2 V |q|)((|q| - ai)/ai V 1)] + c 
- 2^ log(iV/(iV + |a|)) 

1 / log[3(2 V |a|)(|Q| - a,)M] - c 
^ - 2^ log(iV/(iV + |a|)) 



Note that the joint chain {(Xf,Pt)}(>o is not reversible. Complete anal- 
ysis of the joint chain relies on a singular value decomposition presented in 
fill. 



Remark 4.13. Consider the case ai 



log6(rf-l) 



log6(d-l) 



■ = ad = 2- When N is large, 

-2iog{N/{N+i)) 2 (-^ ~^ ^) steps are necessary and sufficient to drive 

the chi-square distance low. Figure 5 shows the decrease of the chi-square 
distance for the Dirichlet-multinomial Gibbs sampler with = 20, d = 5, 



ai = 0.2. In this case, 



log6(d-l) 



(iV + 1) Ri33. 



Remark 4.14. Consider the case ai = • • • = = 1 (a uniform sampler 
on A'^). When N is large, _2!o|f^/(^+d)) ~ !2ii^Mzli(jv + d) steps are nec- 
essary and sufficient to drive the chi-square distance low. Figure 6 shows 
the decrease of the chi-square distance for the Dirichlet-multinomial Gibbs 
sampler with N = 20, d = 5, ai = 1. In this case, MM^(Ar + d) ^ 10. 



Chi-square distance of the Dirichlet-MulUnomial Gibbs sampler 




100 120 



Fig. 5. Chi-square distance of the Dirichlet-multinomial Gibbs sampler with N = 20, 
d = 5, ai = 1/5, starting from Nei. 
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Chi-square distance of the Diijchlel-Mulllncmial Gibbs summer 
3| J > . . 1 1 



2.5 - 




Steps 

Fig. 6. Chi-square distance of the Dirichlet-multinomial Gibbs sampler with N = 20, 
d = 5, Qi = 1, starting from Net . 

4.2. Generalized Bernoulli-Laplace models. In the classical Bernoulli- 
Laplace urn model (see, e.g., [31], Section 4.8.1), there are two urns con- 
taining 2N balls. Initially the left urn contains red balls; the right urn 
contains A'^ black balls. At each step, we first randomly choose one ball from 
the left urn and put it into the right urn. Then we randomly choose one ball 
from the right urn (it contains A'^ + 1 balls now) and put it into the left urn. 
If we track the number of red balls in the left urn, it forms a Markov chain 
with (univariate) hyper geometric distribution as stationary distribution. In 
this section, we study generalizations of this model in two directions. First, 
we allow balls to have more than two colors. Second, we allow more dynamic 
changes at each step. In analogy to the Polya urn models, we define three 
classes of Bernoulli-Laplace models. 

There is a batch of balls with composition t= {li, . . . ,1^), that is, li balls 
of color i for 1 <i <d, distributed in two urns. The left urn contains N < \£\ 
balls; the right urn contains \i\ — N balls. Recall that \i\ = X^iLi^i is the 
total number of balls. 

Definition 4.15. Bernoulli-Laplace level model: s is a parameter sat- 
isfying < s < min{A^, \£\ — N}. At each step, we randomly choose s balls 
from each urn and then switch them. Let = (Xfi, . . . ,Xtd) be the com- 
position of the balls in the left urn after t steps. The process {X.t}t>Q forms 
a multivariate Markov chain on ^ and is called a Bernoulli-Laplace level 
model with parameters £, N and s. 
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Definition 4.16. Bernoulli-Laplace down-up model: s is a parameter 
satisfying < s < A^. At each step, first we randomly choose s balls from the 
left urn and put them into the right urn. Then we randomly choose s balls 
from the right urn (it contains \£\ — N + s balls now) and put them into 
the left urn. Let = {Xn, . . . ^Xtd) be the composition of balls in the left 
urn after t steps. The process {X(}t>o forms a multivariate Markov chain 
on Xfj I and is called a Bernoulli-Laplace down-up model with parameters 
I, N and s. 

Definition 4.17. Bernoulli-Laplace up-down model: s is a parameter 
satisfying < s < A^. At each step, first we randomly choose s balls from 
the right urn and put them into the left urn. Then we randomly choose s 
balls from the left urn (it contains N + s balls now) and put them into the 
right urn. Let = {Xti, . . . , Xtd) be the composition of balls in the left urn 
after t steps. The process {Xi}t>o forms a multivariate Markov chain on 
Xfi I and is called a Bernoulli-Laplace up-down model with parameters t, 
N and s. 

The special case d = 2, s = 1 of the down- up model dates back to Bernoulli 
and Laplace who introduced this model to study diffusion of particles be- 
tween two containers. More details and historical background can be found in 
[20, 31]. Both [6] and [15] study convergence rates of the Bernoulli-Laplace 
level model with d = 2, s = 1 and contain interesting connections to real 
world problems. 

This process can be lifted to a random walk on the space of all A^-subsets 
of \£\ objects. For example, in the s = 1 case of the level model, at each step, 
randomly pick one element from the current set, one from the complement 
set and switch them. This is a nearest-neighbor random walk under the 
metric d{x,y) = N — \x r]y\. The stationary distribution is uniform over 
all A^-subsets. Therefore the Bernoulli-Laplace process has hypergeometric 
distribution as stationary distribution. See [6] for detailed analysis of the 
lifted chain. This point of view gives the following result. 

Lemma 4.18. All three classes of Bernoulli-Laplace models are reversible 
with respect to the multivariate hypergeometric distribution 7i{-\N,i) (2.5). 

The Bernoulli-Laplace up-down and down-up models have alternative 
interpretations as the marginal chains of a multivariate hypergeometric walk. 
The univariate hypergeometric walk was originally studied in [14] . Consider 
the following Gibbs sampler. Let £ = (/i, . . . , Z^) be a vector of counts and 
0<A^<A^ + s< The prior distribution is hypergeometric: 

T:{e)r.n{-\N + s,i), 
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that is, 6 = {01, ... , Ofi) is the composition of a random sample of size N -\- s 
from a pool of \l\ balls with composition i. The likelihood given parameter 

e is 

f{-K\e)^n{-\N,e), 

that is, X = (Xi, . . . ,Xd) is the composition of a random sample of size N 
from a pool of |0| = AT + s balls with composition 6. it{6) is the conjugate 
prior for the hypergeometric distribution and the posterior distribution is 
still hypergeometric: 

7r(0|x) ~x + W(-|s,£-x), 

that is, take a random sample of size s from a pool of \£\ — N balls with 
composition £ — x and set 9i to be the count of balls of color i plus Xj. The 
marginal x-chain has transition kernel 

K(x,y)=^^(0|x)/(y|0) 
e 

and is the same as a Bernoulli-Laplace up-down model with parameters 
i, N, s. The marginal 0-chain has transition kernel 

K{e,e')= /(x|0)7r(0'|x) 

and is the same as a Bernoulli-Laplace down-up model with parameters 
l,N + s,N. 

In analogy to the Polya urn models, these models share the same poly- 
nomial eigenfunctions which are the multivariate Hahn polynomials for the 
hypergeometric distribution. The following theorem is analogous to Theorem 
4.5 and the proof is omitted. 

Theorem 4.19. 1. The Bernoulli-Laplace level model has eigenvalue 

'n\ - s)[k]S[n-k] 



tSk) N[^^,^{\i\-N)[,^^ U_n_iV, 

with multiplicity o-i^d corresponding eigenbasis {(5n(x; N, — •^)}|n|=n; 

where Qn the multivariate Hahn polynomials for the hypergeometric 
distribution as defined in Proposition 2.3. 
The Bernoulli-Laplace down-up model has eigenvalue 

_ {N-s)U\l\-N),^, 
iV[„](|£|- AT + ' 
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with multiplicity l^^^^l and corresponding eigenbasis {Qn{'K;N, —£)}\n\=n! 
where Qn are the multivariate Hahn polynomials for the hypergeometric 
distribution as defined in Proposition 2.3. 

3. The Bernoulli-Laplace up-down model has eigenvalue 

^'^-(iV + .)[„](|£|-iV)[„]' 

with multiplicity \Xf^f\ and corresponding eigenbasis {(5n(x; A^, — £)}|n|=n; 
where Qn are the multivariate Hahn polynomials for the hypergeometric 
distribution as defined in Proposition 2.3. 

4. For all three classes of Bernoulli-Laplace models, the chi-square distance 
between K\-k,-) and stationary distribution is 

N 

X^(0 = E/^n/in(x,x), 
n=l 

where (3n are the eigenvalues for the corresponding process and hn are 
the kernel polynomials for the hypergeometric distribution, that is, (2.16) 
with Oi replaced by —U. 

Remark 4.20. For all three classes of the Bernoulli-Laplace urn mod- 
els, the eigenvalues depend on parameters £ = {li, . . . ,ld) only through 
Formulas for the eigenfunctions require complete knowledge of i. 

4.2.1. Convergence rate of the Bernoulli-Laplace down-up models. We 
specialize to the case in which at beginning the left urn contains balls of the 
same color. 

Proposition 4.21. For N <li and any c > 0, the chi-square distance 
between the Bernoulli-Laplace down-up model and the stationary distribu- 
tion, starting from Nei, satisfies 

log[\£\N{\£\-k)/m-N)li)]+c 



-21og[(iV - s){\£\ - N)/{N{\£\ -N + 1))] ' 



,2 ,n ^ 1 c , / log[|^|iV(|^| - k)/{{\i\ - N)k)] 



2 ' - -2\og[{N-s){\£\-N)/{N{\£\-N + !))]■ 

The proof is similar to that for the sequential Polya urn models and is 
omitted. This result is useful when s <^N . In the extreme case s = N, the 
chain achieves stationarity after one-step. 



Remark 4.22. Consider the multivariate version of the classical Bernoulli- 
Laplace model ([31], Section 4.8.1), where s = 1. Suppose the two urns have 
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C^i~square distance d the Bemoulli-Laplace process 




Fig. 7. Chi-square distance of the Bernoulli-Laplace process with N = 20, 
1^1 = 2N, k = N, starting from Net . 



= 5, 



equal sizes = \£\/2 and initially the left urn contains N balls of the same 
color and the right urn contains N balls of colors different from those in 
the left urn (Xq = A^e^, k = N). Then, for N large, _2iog('i-2/(A^+i)) ^ 

^^^^4'°^^^ steps are necessary and sufficient to drive the chi-square dis- 
tance low. Figure 7 shows the decrease of the chi-square distance for the 
Bernoulli-Laplace process with N = 20, d = 5, \i\ = 2N, li = N. In this case, 

{N+l)log2N 
4 



19. 



4.3. A generalized Ehrenfest urn model. In the classical Ehrenfest model 
(see, e.g., [20, 34]), a certain number of balls are shuttled between two urns. 
At each step one ball is randomly chosen and shifted to the other urn. 
The Ehrenfest chain tracks the number of balls in one of the urns. Refer- 
ence [35] contains a discussion of the Ehrenfest urn model with d> 2 urns. 
The discrete-time analog of their continuous-time Markov chain randomly 
chooses a single ball at each step and redistributes it to the d urns accord- 
ing to a probability vector p = {pi , . . . , pd) ■ The multivariate Ehrenfest chain 
tracks the counts in each urn. 

In this section, we generalize even further by selecting more balls at each 
step. Specifically, there are N indistinguishable balls distributed in d urns. 
s € {0, 1, . . . , N} is a parameter. At each step, we randomly choose s balls 
from the total of N balls and redistribute each of them independently ac- 
cording to the same probability vector p. Let Xu be the number of balls in 
the ith urn at time t. Then {X^ = {Xu, . . . ,Xtd)}t>o forms a multivariate 
Markov chain on X^. 
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It is interesting to observe that this generahzed Ehrenfest model has an 
alternative interpretation as the marginal chain of a Gibbs sampler. Consider 
a multinomial sampling model (sample size s) with a preexisting sample of 
size {N — s). The likelihood of the data given parameter E '^n-s 

f{^\e)^e + Mi-\s,p). 

If we specify the prior distribution for the parameter 6 as 

7r(6>) ~7W(-|iV-s,p), 

then by the Bayes formula, the posterior distribution of 6 given the data has 
probability mass function 



)( 



vr(0|x) = - . 
\e) 

Consider the Gibbs sampling procedure to sample from the joint distribution 
of (X,0). The Gibbs sampler iterates the following two steps: 

• From X, draw 6 from the posterior 7r(-|x). 

• From 0, draw y from the likelihood function f{-\0). 

The marginal x-chain has transition probabilities 

i^(x,y)= ^ 7r(0|x)/(y|0), x,yG^^, 

and is the same as the generalized Ehrenfest urn model described previously. 
Choosing from 7r(-|x) corresponds to choosing N — s balls which will not 
be redistributed (and hence s balls which will be redistributed). Choosing 
y from f{-\0) corresponds to redistributing s balls independently according 
to the same probability vector p. 

It is well known that the marginal x-chain of a Gibbs sampling Markov 
chain is reversible, with the marginal of the joint distribution of (X, G) as 
its stationary distribution. 

Lemma 4.23. The generalized Ehrenfest urn model is reversible with 
respect to the multinomial distribution A4{-\N,p) . 

The following result is again an application of Lemma 3.2. 

Theorem 4.24. 1. The generalized Ehrenfest urn model has eigenvalue 

/3n = ^^^^T^, 0<n<iV, 
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with multiplicity {'^~^^~'^) and corresponding eigenhasis {i^'n}|n|^„, where 

K^i are the multivariate Krawtchouk polynomials (2.10) and |n| = X]f=i "^i- 
In particular, 

/5o = 1 with multiplicity 1, 

g 

j3i = \ — — with multiplicity d—1. 

2. // the process starts from x, then the chi-square distance after I steps is 

N 

(4.7) Xx(0 = E/^n/in(x,x), 

n=l 

where hn are the kernel polynomials (2.22) for the multinomial distribu- 
tion M{-\N,-p)- 

We observe that, in terms of convergence to stationarity, the worst-case 
initial configuration is one of the d configurations where all the balls are in 
a single urn. 

Corollary 4.25. Suppose that initially all balls are in the ith urn. 
Then, for any c > 0, the chi-square distance of the generalized Ehrenfest 
chain satisfies 



-2\og{l- s/N) 

xii.M)>e' for I 



2 n^-^.c ^_^^log{N{l-pi)/p,)-c 



-21og(l - s/iV) 
Proof. Note /?„ = forn > iV - s. By (4.7) and (2.23), 

The inequality < (1 - ^)" impHes that 



Substituting / = we get 



NJ --^^ve.vz-y Pi \ 

(i-p»)M)+c 

2log{l- s/N) ' 

e-'=<xk(0<(l + ^j -^<e' -i- 

And substituting / = ^°^1.2iog{i~ IJn)''' ' S^* 

Xk(0>e^- □ 
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Cht-square distance of EfirenfesI chain 
3| 1 p 1 1 1 . 1 




20 40 60 80 100 120 140 160 180 200 
time 



Fig. 8. Chi-square distance of the Ehrenfest urn model with N — 20, s = 1, d — 5, 
Pi = 1/5, starting from Nei. 



This analysis is particularly useful when s is small compared to A^. In the 
extreme case s = N, the chain achieves stationarity in just one-step. 



Remark 4.26. Consider the discrete-time version of the multivariate 
Ehrenfest urn model in [35] where s = 1. For N large, ^'?2i^g(i"-i/jv)^ ~ 
Niog{N{i-pi)/pi) g^gpg g^j.g necessary and sufficient to drive the chi-square 
distance low. A probabilistic analysis of this chain is given in [10]. Figure 8 
shows the decrease of the chi-square distance for the Ehrenfest urn model 
with N = 20, s = l, d = 5,pi = 1/5. In this case, ^^°s{N{i-p,)/p,) _ 

4.4. Multivariate normal autoregressive process. Consider a multivariate 
normal autoregressive process on defined by 

(4.8) Xt = AXt^i+^„ t>l, 

where {$t}t>i are independent and identically distributed M{0,V). This 
process arises in the multigrid Monte Carlo method by Goodman and Sokal 
[22] and general overrelaxation MCMC algorithms [3, 4, 40]. First we check 
the stationary distribution of the process and the reversibility criterion. 



Proposition 4.27. The Markov chain (4-8) has a unique stationary 
distribution AA(0, S) if and only if V = Ti — AT^A^ . Moreover, when V = 
S — ATiA^ , the Markov chain is reversible if and only if AT^ = SA"^. 
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Proof. If ~ M{0, S), then Xt+i = AXt + ^^+1 ~ J\f{0, V + AY.A^). 
Therefore AA(0, S) is a stationary distribution if and only if S = F + ATiA^ . 
For the uniqueness, since is a positive definite matrix, 

the spectral radius of A is strictly less than 1. By iterating, Xt|Xo = x~ 
A^(^*x,E*=o^^'^(^^y)- But A*x^O and 

Y^A^v{A^y = Y,[A^i:{A^y - Ai+^Y.{A^y+^] 

3=0 i=o 

as f — > oo. Therefore AA(0, S) is the unique stationary distribution. 

To check the reversibility criterion, note the transition density and sta- 
tionary density are 

i^(x,y) = 1 ^(y-Ax)^y-^(y-Ax)/2^ 

^(y) = ^^=e-y^^"V/2, X, y G R'^. 
V(2^)'^|S| 

Hence, the Markov chain is reversible if and only if 
7r(x) A'(x, y) = 7r(y) (y, x) for all x, y 
^ (y-Ax)^y^^(y- Ax)+x^S~ix 

= (x - AyYv-\^ - Ay) + y^S" V for all x, y 





j:-^ + A'^v-^A = 


v-\ 


A^V'^ = V-^A 


<^ 




-1 


A^V^^ = V^^A 




v + a^t, = j:, 


A^V 


^^ = V-^A 






= s. 


VA^ = AV 


^ 


AT,A^ = A^S, 




- AT{A^f = AT- A^TA^ 




at,a'^ = A^S, 




= AT 


4^ 


AT, = TA^. 







In the following, we assume that V = T- ATA^ and AT. = TA^ . Hence, 
the Markov chain (4.8) has a unique stationary distribution vr ~ AA(0, T) 
and is reversible. Let Ai be the largest eigenvalue of A. In [22], Goodman 
and Sokal identify Ai as the rate of decay of the autocorrelation functions 
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of Xt. In [4], Barone and Prigessi identify Ai as the rate of variational norm 
convergence. Roberts and Sahu [40] show that for all / G Z^(7r) and r > Ai, 

Our contribution is to use the eigenfunctions of this Markov chain to 
obtain an exact expression for the chi-square distance from stationarity after 
I steps, for any ^ > (Proposition 4.30). This leads to nonasymptotic bounds 
for convergence to stationarity and strengthens earlier results. 

The idea is that the original autoregressive process can be transformed 
into another that can be easily analyzed. The condition AT, = TiA^ implies 
that is symmetric and thus orthogonally diagonalizable. Let 

its eigendecomposition, where P^P = I and D is 
the diagonal matrix containing the eigenvalues 1 > |Ai| > • • • > jA^j of A. We 
study the transformed Markov chain {Zt}, where 

Zt = P'^T.-'^/^Xt, t > 0. 

From (4.8), 

Zt = (P^S-V2^sV2p)z<_i + (P^S-i/2)^„ t > 0. 

Note that 

pTY,-y^AT,^/^P = D, 

Var(P^S-^/2^„) = P'^T.-^/^CT.-^'^P 

= P^S"V2(S-ASA^)S-V2p 

= I-D'^. 

It follows that 

(4.9) Zt = DZt^i + $,[, t>0, 

where ^[ are independent and identically distributed A^(0, / — D'^). Since D 
is a diagonal matrix with entries Ai, A2, . . . , A^ and components of are in- 
dependent, all components of the Zt chain proceed as independent univariate 
normal autoregressive processes, that is, for 1 < i < d, 

(4.10) Zt,^ = X^Zt_l,i+C't,i^ 

where 6.ti, t> 1, are independent and identically distributed M{0, 1 — A?). 

Univariate normal autoregressive processes are well studied. The ith com- 
ponent process (4.10) is reversible with respect to the standard normal dis- 
tribution M{0, 1) and has eigenvalues A", n > 0, with the Hermit polynomials 
{Hn}n>o (2.14) as the corresponding eigenfunctions. This spectral informa- 
tion is easily transferred to that of the product chain due to independence. 
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Lemma 4.28. Let {K',tt') be the Markov operator and stationary dis- 
tribution corresponding to the transformed Markov chain {Zt}t>o in (4-9). 
Then K' is reversible with respect to tt' ~ AA(0,/). Moreover, 

1. K' has eigenvalues 



(3n = f[XT, n = (ni,...,nrf)G<, 



i=l 



with corresponding eigenf unctions H^{^) = nf=i Hmi^), which satisfy 
the orthogonality relation 



2. The chi-square distance of K' , starting from state z G W^, after I steps is 



xlii) 



nti(i-Af) 



Proof. Let Ki denote the Markov operator of the ith component pro- 
cess. By mdependence between the component processes 



i=l 
d 



Then the first assertion follows. For calculation of the chi-square distance, 

d \2nil tt2 



xi{i) 



117^0 

11 2^ 2"«n,-! 



1. 



i=in,=o - - vnti(i-Af) 

The last equality follows from the multilinear generating function (2.15). □ 



It is trivial to check that the chi-square distance of the transformed chain 
is equal to that of the original chain K. This implies the following result. 
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Proposition 4.29. For the Markov chain (4-8) satisfying C = S — 
AJ:A^ and AT, = TA^ , the chi- square distance after I steps, starting from 
the state x £ W^, is 

xlii) = I , - 1, 

^nti(i-Af) 

where z = P-^S^-'^/^x. 

Clearly, all the eigenvalues (Ai,...,Arf) of A play a role in determining 
the speed of convergence. However, if one is willing to compromise a little 
on sharpness of the bounds, we can get a result only involving the largest 
eigenvalue Ai of A. 

Proposition 4.30. For the Markov chain (4-8), when starting from the 
state 0, 

XS(0<10.- /„.,>J|^,c>log(^ 

Proof. Note that Xo(0 < (1 " Af _ i. if / > -^^g^, then Af < 
e"72< 1/2. Hence, 

xl{l)<{l-\f)-'"^-l<{l + e-'Y-l 
< e'^^'^e-' < lOe-^ 
when c > log(d/2). For the lower bound, 

,2 



At' 



X5(0>(l-Af)-^/^-l>^. 



Hence, if/<r^: 



4.4.1. An example from image analysis. We now borrow an example 
from Bayesian image analysis discussed in [40] and [3]. An image x is a 
vector of size 256 corresponding to values on the 16 x 16 lattice of pixels. 
Roberts and Sahu [40] model x using a Gaussian prior density g given by 

/ \ — 5y^. .(xi—xi)^ 

g[x) cx e '^^-j'- ^' , 
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where 5 is a constant and i ~ j if Xi and xj are neighbors. Suppose we observe 
a corrupted image y instead of x. The value yi for each pixel i follows an 
independent Gaussian density with mean Xi and variance cr^. Let rij denote 
the number of neighbors of vertex i, 1 <i < 256. The posterior density of x 
is Gaussian with inverse covariance matrix Q given by 

{26ni + -^, i=j, 
-26, i^j, 
0, otherwise. 

Suppose we use the following reversible version of the Gibbs sampling 
Markov chain to sample from the posterior Af{0,Q~^) density. At every it- 
eration, sample Xi given the other coordinates, then sample X2 given the 
other coordinates, . . . , then sample given the other coordinates twice, 
then sample X^-i given the other coordinates,..., and finally sample Xi 
given the other coordinates. This version of the Gibbs sampler can be ex- 
pressed in the form (4.8) with 

A = WL^, 

V = WDW^ + {D + L^)-^D{D + L)-^, 

where D and L are the diagonal and lower triangular parts of the matrix Q, 
respectively, and W = {D + L^)~^L{D + L)^^. A satisfies the reversibility 
condition AQ~^ = Q~^A^ . For a concrete example, the largest eigenvalue of 
A when 6 = 100 and a = 0.5 is 0.9795. Proposition 4.30 tells us that Xo(^) < 
We-" for / = 8.3607 + 12.0620c (for any c > 4.8520), and XqW > e^/A for 
/ = 8.3607 — 12.0620c (for any c > 0). Figure 9 shows the decrease of the 
chi-square distance for this chain starting at 0. Note that eight steps of the 
Gibbs sampler correspond to 8 x 2 x 256 = 4096 mini sampling steps. 

5. Discussion. So far, probabilists have come up with various techniques 
of finding rates of convergence of Markov chains, which can be roughly 
grouped under five headings: 

(a) using the spectral decomposition of a Markov chain, 

(b) using Harris recurrence techniques (see [33]), 

(c) using probabilistic techniques such as coupling (see [42]), iterated ran- 
dom functions (see [9]) and strong stationary times (see [1, 7]), 

(d) using Nash inequalities (see [12]) or logarithmic Sobolev inequalities 
(see [13]), 

(e) using geometric techniques like Poincare and Cheeger's inequalities (see 
[8, 30]). 
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Chi-square distance of the tnultivariaie normal Gibtis sampler 
3 1 p 1 1 1 1 1 1 1 1 — 




Steps 

Fig. 9. Chi-square distance of the multivariate normal Gibbs sampler with d = 256, 
5 — 100, a — 0.5, starting from 0. 

We use technique (a), that is, using spectral decomposition of a Markov 
chain, for analyzing all the examples in this paper. An advantage of this tech- 
nique over the others is that, when applicable, it gives sharp and accurate 
results. But we require knowledge of all the eigenvalues and eigenfunctions 
of the Markov chain to apply this technique. In addition, the eigenfunctions 
have to be suitable for certain algebraic manipulations. This narrows the 
scope of this technique compared to others. In our examples, the eigenfunc- 
tions turn out to be polynomials. We deployed some machinery from the rich 
field of orthogonal polynomials and were able to compute exact rates of con- 
vergence for every Markov chain that we analyzed. But still our success was 
restricted to classes of natural but special starting points for every Markov 
chain. We could not appropriately manipulate the distance to stationarity 
of these Markov chains from a general starting point. Also, exact analysis 
of the nonreversible case for the multivariate normal autoregressive process 
remains open. This provides lots of future directions to go, but it is sobering 
to learn that even in these examples where we know all the eigenvalues and 
eigenvectors, it is hard, if not impossible, to have an exact analysis from a 
general starting point. To conclude, the theory of rates of convergence of 
Markov chains has a long way to go, but it is nice to see standard examples 
where exact analysis is available by present techniques. 
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